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(57) Abstract 

A method for analyzing traffic data generated by a plurality of web servers, which host a single web site. The site is minoxed on 
each server. A traffic data hit is generated responsive to each access of one of the servers. The hit includes data representing the time of 
the access. Each data hit is stored in a log file (46A, 48A, 50A) on the server accessed. The first-stored data hit is read from each server. 
Each of the read data hits are compared (64). and the oldest data hit is passed to log file analyzer (66). The next-stoied data hit is read 
from the server firom which the passed data hit was read, and a second comparison is performed on the read data hits, with the oldest data 
hits being passed lo the log file analyzer. This process continues until all of the data hits are read, onnpaied, and passed to the log file 
analyzer. This results in passing all of the data hits to the log file analyzer in the chronological order in which the hits were generated. 
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SYSTEM AND METHOD FOR ANALYZING 
WEB-SERVER LOG FILES 

BACKGROUND OF THE INVENTION 
5 This invention relates generally to web-server traffic data analysis and more 

particularly to a system and method for analyzing web-server log files. 

The worldwide web (hereinafter "web") is rapidly becoming one of the most 
important publishing mediums today. The reason is simple: web servers interconnected via 
the Internet provide access to a potentially worldwide audience with a minimal investment in 

1 0 time and resources in building a web site. The web server makes available for retrieval and 
posting a wide range of media in a variety of formats, including audio, video and traditional 
text and graphics. And the ease of creating a web site makes reaching this worldwide 
audience a reality for all types of users, from corporations, to startup companies, to 
organizations and individuals. 

1 5 Unlike other forms of media, a web site is interactive and the web server can 

passively gather access information about each user by observing and logging the traffic data 
packets exchanged between the web server and the user. Important facts about the users can 
be determined directly or inferentially by analyzing the traffic data and the context of the 
"hit." Moreover, traffic data collected over a period of time can yield statistical information, 

20 such as the number of users visiting the site each day, what coimtries, states Or cities the users 

connect from, and the most active day or hour of the week. Such statistical information is 
useful in tailoring marketing or managerial strategies to better match the apparent needs of 
the audience. Each hit is also encoded with the date and time of the access. Because the 
statistic^ infomoiation of interest is virtually all related to time periods, accurately tracking 

25 the time of each hit is critical. 

To optimize use of this statistical inforaiation, web server traffic analysis must be 
timely. However, it is not unusual for a web server to process thousands of users daily. The 
resulting access mformation recorded by the web server amounts to megabytes of traffic data. 
Some web servers generate gigabytes of daily traffic data. Analyzing the traffic data for even 

30 a single day to identify trends or generate statistics is computationally intensive and time- 
consuming. Moreover, the processing time needed to analyze the traffic data for several 
days, weeks or months increases linearly as the time firame of interest increases. 
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The problem of performing efiBcient and timely traffic analysis is not imique to web 
servers. Rather, traffic data analysis is possible whenever traffic data is observable and can 
be recorded in a uniform maimer, such as in a distributed database, client-server system or 
other remote access environment. 

Some web servers are so busy, i.e., handle so much traffic, that they require multiple 
servers to handle all of the traffic. Other users may need to employ multiple servers because 
of the large size of the web site. Critical sites, i.e., ones that cannot afford to be down 
because of a problem with a server, may also choose to deploy their site on multiple servers. 
Such multiple servers are sometimes referred to as a server farm. Server farms provide high 
bandwidth reliable access to web sites. 

There are several topologies that may be used in a server farm, but the most important 
ones divide the farm into clusters of servers. The web site is mirrored on each server within 
the cluster. Special hardware receives all of the traffic to the web site and distributes each hit 
to one of the servers. Some systems provide accurate load balancing in that all of the hits are 
rotated in sequence among each of the servers. But others assign each hit from a new source 
to a server, and further access to the site from that soiirce is directed to the assigned server. 
This is accomplished by assigning a predetermined time period, for example 30 minutes, 
during which all future access from the same source is considered to be part of a single 
session from that source. As described fiirther below, the latter approach permits some log- 
file analysis, which is not possible using the load-balancing technique. 

Server farms, although providing load balancing and redundancy, present problems in 
analyzing the log files generated by the servers. Prior art systems for analyzing web-server 
log files can handle multiple log files, but these files are consecutively generated, i.e., the 
data packets within each log file are in chronological order and the log files themselves 
correspond to time periods containing data packets from within the periods. In other words, 
the log files are also consecutively generated. Log files on servers in a server farm, however, 
are concurrently generated. Each log file covers or overlaps the same time period. On server 
farms that rotate the hits among each server, log file analysis programs do not generate useful 
information. Bmte force solutions are possible, such as sorting all of the log files and 
creating a new single file, or copying all of the hits from each log file to a large database, 
which can sort and analyze the data. These solutions have severe drawbacks: they are 

2 
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computationally intensive, they require creation of large new files, and they are done only 
after log files are complete, i.e., not on the fly while the log file is still being populated. 

Server farms that assign hits from a new source to a single user can run prior art log 
analysis programs on each server and sum the results. This, however, is not completely 
accurate and is disadvantageous because it requires generation of separate reports that must 
each be consulted or fiirther manipulated to obtain information that applies to the entire 
server farm. 

There is consequently a need for a system and method for analyzing web-server log 
files that are concurrently generated, such as those generated by a server fann. 

There is a fiirther need for such a system and method that can analyze the log files 
substantially in real time. 

There is still a fiirther need for such a system that can analyze the log files without 
generating new large files and without the need for substantial additional computing power. 

There is also a need for such a system that can analyze log files whether they are 
concurrently or consecutively generated. 

SUMMARY OF THE INVENTION 

The present invention comprises a method for analyzing log files containing a 
plurality of data packets in sequence comprising: (a) selecting the first data packet in each 
log file; (b) comparing the selected data packets; (c) passing the oldest of the selected data 
packets to a log file analyzer; (d) selecting the next data packet in the log file in which the 
passed data packet was selected; and (e) repeating steps (b) through (d) xmtil all of the data 
packets in the log files are passed. 

The foregoing and other features and advantages of the invention will become more 
readily apparent &om the following detailed description of a preferred embodiment of the 
invention, which proceeds with reference to the accompanying drawings. 

BRIEF DESCRIPTION OF TEIE DRAWINGS 
FIG. 1 is a fimctional block diagram of a prior art system for analyzing traffic data in 
a distributed computing environment according to the present invention. 



wo 00/1 0093 PCTAJS99/1 8282 

FIG. 2 is a flow diagram of a prior art method for analyzing traffic data in a 
distributed computing environment according to the present invention using the system of 
FIG. 1. 

FIG. 3 A shows a prior art format used in storing a "hit" of traffic data received by the 
server of FIG. L 

FIG. 3B shows, by way of example, a "hit" of formatted traffic data received by the 
server of FIG. L 

FIG. 4 is a schematic diagram of a server farm, including multiple servers Uke that 
shown and described in FIG. 1 . 

FIG. 5 is a schematic diagram illustrating the operation of the server farm of Fig. 4. 

FIG. 6 is a schematic diagram illustrating the present invention implemented in the 
server farm of Fig. 4. 

FIG. 7 is a schematic diagram similar to Fig. 6, but illustrating the present invention 
operating on consecutive log files. 

FIG. 8 is a flow diagram of a routine for implementing the present invention. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 

FIG. 1 is a fimctional block diagram of a prior art system for analyzing traffic data in 
a distributed computing environment 9. It is more fully described in "WebTrends Installation 
and User Guide," version 2.2, October 1996, and in U.S. Patent Application No. 08/801,707, 
the disclosures of which are incorporated herein by reference. WebTrends is a trademark of 
Webtrends Corporation, Portland, Oregon. 

A server 10 provides web site and related services to remote users. By way of 
example, the remote users can access the server 10 £ix>m a remote computer system 12 
interconnected with the server 10 over a network coimection 13, such as the Intemet or an 
intranetwork, a dial up (or point-to-point) connection 14 or a direct (dedicated) connection 
17. Other types of remote access connections are also possible. 

Each access by a remote user to the server 10 results in a "hit" of raw traffic data 11. 
The format used in storing each traffic data hit 1 1 and an example of a traffic data hit 1 1 are 
described below with reference to FIGS. 3 A and 3B, respectively. The server 10 preferably 
stores each traffic data hit 1 1 in a log file 15, although a database 16 or other storage structure 
can be used. 
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To analyze the traffic data, the server 10 exammes each traffic data hit 1 1 and stores 
the access information obtained from the traffic data as analysis results 18A-C. Five sources 
of traffic data 1 1 (remote system 12, dial-up connection 14, log file 15, database 16 and direct 
connection 17) are shown. Other sources are also possible. The traffic data hits 1 1 can 
originate from any single source or from a combination of these sources. While the server 10 
receives traffic data hits 1 1 continuously, separate sets of analysis results 18A-C are stored 
for each discrete reporting period, called a time slice. The analysis results 18A-C are used 
for generating summaries 19A-C of the access information. 

In the described embodiment, the server 10 is typically an Intel Pentium-based 
computer system equipped with a processor, memory, input/output interfaces, a network 
interface, a secondary storage deyice and a user interface, preferably such as a keyboard and 
display. The server 10 typically operates under the control of either the Microsoft Windows 
NT or Unix operating systems and executes either Microsoft Intemet Information Server or 
NetScape Commimications Server software. Pentium, Microsoft, Windows, Windows NT, 
Unix, Netscape and Netscape Commimications Server are trademarks of their respective 
owners. However, other server 10 configmrations varying in hardware, such as DOS- 
compatible, Apple Macintosh, Sun Workstation and other platforms, in operating systems, 
such as MS-DOS, Unix and others, and m web software are also possible. Apple, Macintosh, 
Sun and MS-DOS are trademarks of their respective owners. 

FIG. 2 is a flow diagram of a method 20 for analyzing traffic data in a distributed 
computing environment according to the present invention using the system of FIG. 1. Its 
purpose is to continuously collect and summarize access information from traffic data hits 1 1 
while allowing on-demand, ad hoc analyses. The method 20 consists of two routines. 
Access information is collected from traffic data hits 1 1 and summarized by the server 10 
into analysis results 18A-C (block 21). The access information is separately analyzed for 
generating the summaries 19A-C which identify trends, statistics and other information 
(block 22). The collection and summarizing of the access information (block 21) is 
performed continuously by the server 10 while the analysis of the access information (block 
22) is performed on an ad hoc basis by either tiie server 10 or a separate workstation (not 
shown). 

The method 20 is preferably implemented as a computer program executed by the 
server 10 and embodied m a storage medium comprising computer-readable code. In the 
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described embodiment, the method 20 is written in the C programming language, although 
other programming languages are equally suitable. It operates in a Microsoft Windows 
environment and can analyze Common Log File, Combined Log File and proprietary log file 
formats from industry standard web servers, such as those licensed by NetScape, NCSA, 
O'Reilly WebSite, Quarterdeck, C-Builder, Microsoft, Oracle, EMWAC, and other Windows 
3.x, Windows NT 95, Unix and Macintosh Web servers. The analysis results 18A-C can be 
stored in a proprietary or standard database 16 (shown in FIG. 1), such as SQL, BTRIEVE, 
ORACLE, INFORMIX and others. The method 20 uses the analysis results 18A-C of traffic 
data hits 1 1 as collected into the log file 15 or database 16 for building activity, geographic, 
demographic and other summaries 19A-C, such as listed below in Table 1. Other sxraimaries 
19A-C are also possible. 

Table 1 . 

User Profile by Regions 
Top Requested Pages 
Top Entry Pages 
Single Access Pages 
Advertising Views 
Advertising Views and Clicks 
Most Active Organizations 
Activity Summary by Day of Week 
Activity Summary by Hour of the Day 

Web Server Statistics and Analysis 
Top Downloaded File Types and Sizes 
Activity by Organization Type 
Top Referring Sites 
Top Browsers 

Microsoft Explorer Browsers 
Top Platforms 

In addition, the analysis results 18A-C can be used for automatically producing 
reports and summaries which include statistical information and graphs showing, by way of 
example, user activity by market, interest level in specific web pages or services, which 
products are most popular, whether a visitor has a local, national or international origin and 
similar information. In the described embodiment, the summaries 19A-C can be generated as 
reports in a variety of formats. These formats include hypertext markup language (HTML) 
files compatible with the majority of popular web browsers, proprietary file formats for use 

6 



General Statistics Table 
Least Requested Pages 
Top Exit Pages 
Top Paths Through Site 
Advertising Clicks 
Most Downloaded Files 
Most Active Coimtries 
Activity Sunxmary by Day 
Activity Sunmaary Level by Hours 
of the Day 
Client Errors 
Server Errors 
Top Directories Accessed 
Top Referring URLs 
Netscape Browsers 
Visiting Spiders 
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10 



with word processing, spreadsheet, database and other programs, such as Microsoft Word, 
Microsoft Excel, ASCII files and various other formats. Word and Excel are trademarks of 
Microsoft Corporation, Redmond, Washington. 

FIG. 3 A shows a format used in storing a "hit" of raw traffic data 1 1 received by the 
server of HG. 1. A raw traffic data hit 11 is not in the format shown in FIG. 3 A. Rather, the 
contents of each field in the format is determined firom the data packets exchanged between 
the server 10 and the source of the traffic data hit 1 1 and the information pulled firom the data 
packets is stored into a data record using the format of FIG. 3 A prior to being stored in the 
log file 15 (shown in FIG. 1) or processed. 

Each traffic data hit 1 1 is a formatted string of ASCII data. The format is based on 
the standard log file format developed by the National Computer Security Association 
(NCSA), the standard logging format. used by most web servers. The format consists of 
seven fields as follows: 



15 



20 



25 



30 



Field Name 

User Address (30): 

RFC931 (31): 



User Authentication (32): 
Date/Time (33): 
Request (34): 
Return Code (35): 
Transfer Size (36): 



Description 

Internet protocol (IP) address or domain name of the 

user accessing the site. 
Obsolete field usually left blank, but increasingly used 

by many web servers to store the host domain name 

for multi-homed log files. 
Exchanges the user name if required for access to the 

web site. 

Date and time of the access and the time offset fix>m 
GMT. 

Either GET (a page request) or POST (a form 

submission) command. 
Return status of the request which specifies whether the 

transfer was successfial. 
Niunber of bytes transferred for the file request, that is, 

the file size. 



In addition, three optional fields can be employed as follows: 



35 



Field Name 
Referring Site (37): 

Agent (38): 

Cookie (39): 



40 



Description 

URL used to obtain web site information for performing 
the "hit/* 

Browser version, including make, model or version 

number and operating system. 
Unique identifier permissively used to identify a 

particular user. 
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Other fonnats of traffic data hits 1 1 are also possible, including proprietary formats 
containing additional fields, such as time to transmit, type of service operation and others. 
Moreover, modifications and additions to the formats of raw traffic data hits 11 are constantly 
occuxring and the extensions required by the present invention to handle such variations of 
the formats would be known to one skilled in the art. 

FIG. 3B shows, by way of example, a "hit" of raw traffic data received by the server 
of FIG. 1. The user address 30 field is "tarpon,gulf.net" mdicating the user originates fi-om a 
domain named "gulf.net" residing on a machine called "taipon," The RFC931 31 and user 
authorization 32 fields are "-" indicating blank entries. The Date/Time 33 field is 
"12/Jan/1996:20:38:17 +0000" mdicating an access on January 12, 1996 at 8:38:17 pm GMT. 
The Request 34 field is "GET /general.htm HTTP/1 .0" indicating the user requested the 
"general.htm" page. The Return Code 35 and Transfer Size 36 fields are 200 and 3599, 
respectively, indicating a successful transfer of 3599 bytes. 

Turning now to FIG. 4, indicated generally at 40 is a server farm constracted in 
accordance with the present invention. Included therein are two server clusters 42, 44, each 
of which includes servers 46, 48, 50 and servers 50, 52, 54, respectively. Each of the servers 
in clusters 42, 44 are substantially identical to server 10 in FIG. 1. In the present 
embodiment, server cluster 42 hosts a first web site, which is mirrored on each of the servers 
therein, at a single identified Internet Protocol (IP) address. The servers in cluster 44 host a 
second web site, which is mirrored on each of the servers therein, at a second identified IP 
address. 

Each of the servers in clusters 42, 44 is connected via a cable, like cable 58, to a 
redirector 60. The redirector in turn receives an input firom a network connection 62, which 
in the present embodiment is an Internet connection. Redirector 60 is a prior art hardware 
device that receives a source of traffic data hits - in the present case, via connection 62 - and 
distributes them to the servers in clusters 42, 44. 

In the present implemratation, redirector 60 distributes traffic data hits within each of 
clusters 42, 44. In other words, the traffic data hits generated as a result of access to the web 
site posted on cluster 42, are distributed among servers 46, 48, 50. Similarly, traffic data hits 
produced by accessing the web site on cluster 44 are distributed among servers 52, 54, 56. 
One device suitable for fimctioning as a redirector is manufactured by Cisco Systems and 
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sold under the name LocalDirector. Those having skill in the art will appreciate that other 
known hardware devices can perforai the function of redirector 60. 

Turning now to FIG. 5, log files 46 A, 48 A, 5 OA, are each stored on the server 
corresponding to the numeral used to indicate the log file. These log files are generated and 
5 stored in the manner described in connection with the server of FIG. 1 . In FIG. 5, the hits are 

numbered sequentially, hit number 1 through hit number 13 in the chronological order in 
which each traffic data hit was generated. In the depiction of FIG. 5, each of log files 46A, 
48A, 50A is still being added to. That is in log file 46A, for example, hit number 1 is the 
first-stored data hit, and hit number 5 is the next-stored data hit, with hits numbers 6 and 12 
10 being thereafter stored in sequence. Because log file 46A is not yet fiiU and it remains open, 

additional hits may be stored in sequence after hit number 12. The same is tme for logs file 
48A, 50A. 

Turning now to FIG. 6, included therein is a sorter 64, which examines the hits in 
sequence in each of the log files and passes them - in the chronological order in which each 
15 hit was generated — to a log file analyzer 66. The log file analyzer operates generally as 

described in connection with the server depicted in FIG. 1 . Thereafter, analysis results are 
passed to analysis results 18A-C, also as described in connection with FIG. 1. 

The operation of sorter 64 can best be understood with reference to the following 
Table 2 and to the flow diagram depicted in FIG. 8. 



20 



25 



30 



Table 2 



Compare 


Passes 


1 2 


4 


1 


5 2 


4 


2 


5 3 


4 


3 


5 8 


4 


4 


5 8 


7 


5 


6 8 


7 


6 


12 8 


7 


7 


12 8 


9 


8 


12 10 


9 


9 


12 10 


11 


10 
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First, in block 68 of FIG. 8, the iBrst record received in each log file 46 A, 48 A, 50 A is 
selected. This selection is depicted in Table 2, line 1, in which hits 1, 2, and 4 appear in the 
Compare column. Li block 70, sorter 64 compares each of hits 1, 2, and 4 and passes the 
oldest (in time) record, namely hit 1 (block 72). The routine next determines, in block 74, 
whether all the records in all of the log files have been selected, compared, and passed. If so, 
the routine ends in block 76. If not, in block 78 the routine selects the next record in the log 
file containing the record that was passed in block 72. In the example currently under 
consideration, the next record is hit number 5 in log file 46A. Next - with reference to line 2 
of Table 2 - in block 70, hits 5^ 2 and 4 are compared, and hit 2 is passed, it being the oldest 
(in time) of the three records compared. 

Because the routine operates in first-in, first-out (FIFO) sequence on each of the log 
files, it can process while the files remain open and continue to receive additional hits iii 
sequence. 

In the example of FIG. 7, log files 80, 82, 84 are operated on by sorter 64, It should 
be noted that these log files include hits that are in sequential chronological order. What is 
more, each of the log files is generated in chronological order. Thus, log file 80 represents an 
identified time period, ranging between the time associated with hit 1 and the time of hit 4; 
log file 82, ranging between the times of hits 5 and 8; and log file 84, between hits 9 and 12. 
With reference again to FIG. 8, and to Table 3, which depicts the sequential comparisons 
made on the log-file records in FIG. 7, hits 1, 5 and 9 are selected in block 68 and compared 
in block 70. Hit 1, the oldest record, is passed in block 72 and the next-in record, hit number 
2, is selected in block 78. This sequence continues until all of hits 1 through 12 are passed, 
hits 1 through 4 being first passed in sequence firom log file 80, hits 5 through 8 being next 
passed in sequence firom log file 82, and finally hits 9 through 12 in sequence firom log file 
84. 
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% ' Table 3 



5 



Compare 


Passes 


1. 5 


9 


1 


2. 5 


9 


2 


3, 5 


9 


3 


4 5 


9 


4 


- 5 


9 


5 


- 6 


9 


6 


- 7 


9 


7 


- 8 


9 


8 




9 


9 




10 


10 



The present invention therefore properly sorts traffic data hits in either concuirently- 
15 generated or consecutively-generated log files. This is advantageous because it obviates the 

need for separate routines or for configuring a program dependant upon whether the log files 
are consecutive or concurrent. In addition, the present invention is capable of sorting log 
files while they continue to receive and store new traffic data hits. This analysis on the fly 
provides users with statistical data and reports on a near real time basis. 
20 Numerous modifications and embodiments of the invention will be obvious to those 

skilled in the art. Although the present invention has been described in terms of one 
embodiment, this should not be interpreted as limiting. Various alterations, modifications and 
combinations will no doubt become apparent to those skilled in the art after haying read the 
above disclosure. Accordingly, the appended claims should be interpreted as covering all 
25 alterations and modifications that fall within the spirit and scope of the invention. 
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WE CLAIM: 

1 . A method for analyzing traffic data generated by a plurality of web servers 
connected via a network to a plurality of computing devices comprising: 

(a) generating a plurality of traffic data hits, each of said hits corresponding to a data 
packet exchanged between one of the servers and one of the computing devices; 

(b) storing each data hit in a log file on its associated server; 

(c) reading the first-stored data hit fi-om each server; 

(d) comparing the read data hits; 

(e) passing the oldest data hit; 

(f) reading the next-stored data hit firom the server firom which the passed data hit was 

read; 

(g) repeating steps (d) through (e) until all of the data hits are read; and 

(h) analyzing the passed data hits. 

2. The method of claim 1 wherein steps (b) and (c) are performed substantially 
simultaneously. 

3. The method of claim 1 wherein step (b) is entirely performed prior to step (c), 

4. The method of claim 1 wherein said web servers mirror one another. 

5. The method of claim 1 wherein said traffic data hits are generated in 
chronological sequence and wherein different ones of said log files contain data hits 
corresponding to traffic data hits generated in the same time period* 

6. A method for analyzing log files containing a plurality of data hits in 
sequence, each of which corresponds to a traffic data hit generated by a web server, said 
method comprising: 

(a) selecting the first data hit in each log file; 

(b) comparing the selected data hits; 

(c) passing the oldest of the selected data hits to a log file analyzer; 

12 
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% ' (d) selecting the next data hit in the log file in which the passed data hit was selected; 

and 

(e) repeating steps (b) through (d) until all of the data hits in the log files are passed. 

5 7. The method of claim 6 wherein said data hits are each associated with a 

unique time and wherein the last record in one file is associated with a time later than the 
first record in another file. 

8. The method of claim 6 wherein said log files are each associated with a unique 
1 0 time period and wherein the time for each data hit is within the period for its log file, 

9. The method of claim 6 wherein one of said log files is generated by a first web 
server and wherein another of said log files is generated by a second web server. 

15 10. A method for analyzing web-server log files comprising: 

generating traffic data hits representing actions on the web server, 
generating a data hit associated with each traffic data hit, each data hit being 
associated with a unique time; 

storing the data hits in a plurality of log files; 
20 sorting data hits firom a pliu^lity of the log files into chronological order, and 

analyzing the sorted data hits. 

1 1. The method of claim 10 wherein storing the data hits in sequence in a plurality 
of log files comprises alternating storing the data hits into different log files. 

25 

12. The method of claim 1 1 wherein the data hits within each log file are stored in 
sequence. 

13. The method of claim 12 sorting data hits fi-om a plurality of the log files into 
30 chronological order comprises: 

(a) selecting the first data hit in each log file; 

(b) comparing the selected data hits; 

13 
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(c) passing the oldest of the selected data hits to a log file analyzer; 

(d) selecting the next data hit in the log file in which the passed data hit was selected; 

and 

(e) repeating steps (b) through (d) until all of the data hits in the log files are passed.. 

14, A system for analyzing web-server log files comprising: 
a source of traffic data hits generated by a web server; each of said data hits being 
associated with a unique time; 

a log file containing the data hits in sequence; 

a sorter for sorting the data hits firom a plurality of the log files into chronological 
order; and 

an analyzer for analyzing the sorted data hits. 



15- The system of claim 14 wherein said sorter comprises: 

means for selecting a data hit in each log file; 

means for comparing the selected data hits; and 

means for passing the oldest of the selected data hits to the analyzer. 
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